MIPA: Mutual Information Based Paraphrase Acquisition via Bilingual Pivoting
نویسندگان
چکیده
We present a pointwise mutual information (PMI) based approach for formalizing paraphrasability and propose a variant of PMI, called mutual information based paraphrase acquisition (MIPA), for paraphrase acquisition. Our paraphrase acquisition method first acquires lexical paraphrase pairs by bilingual pivoting and then reranks them by PMI and distributional similarity. The complementary nature of information from bilingual corpora and frommonolingual corpora renders the proposed method robust. Experimental results show that the proposed method substantially outperforms bilingual pivoting and distributional similarity themselves in terms of metrics such as mean reciprocal rank, mean average precision, coverage, and Spearman’s correlation.
منابع مشابه
Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity
This paper improves an existing bilingual paraphrase extraction technique using monolingual distributional similarity to rerank candidate paraphrases. Raw monolingual data provides a complementary and orthogonal source of information that lessens the commonly observed errors in bilingual pivotbased methods. Our experiments reveal that monolingual scoring of bilingually extracted paraphrases has...
متن کاملParaQuery: Making Sense of Paraphrase Collections
Pivoting on bilingual parallel corpora is a popular approach for paraphrase acquisition. Although such pivoted paraphrase collections have been successfully used to improve the performance of several different NLP applications, it is still difficult to get an intrinsic estimate of the quality and coverage of the paraphrases contained in these collections. We present ParaQuery, a tool that helps...
متن کاملDomain-Specific Paraphrase Extraction
The validity of applying paraphrase rules depends on the domain of the text that they are being applied to. We develop a novel method for extracting domainspecific paraphrases. We adapt the bilingual pivoting paraphrase method to bias the training data to be more like our target domain of biology. Our best model results in higher precision while retaining complete recall, giving a 10% relative ...
متن کاملEnlarging Paraphrase Collections through Generalization and Instantiation
This paper presents a paraphrase acquisition method that uncovers and exploits generalities underlying paraphrases: paraphrase patterns are first induced and then used to collect novel instances. Unlike existing methods, ours uses both bilingual parallel and monolingual corpora. While the former are regarded as a source of high-quality seed paraphrases, the latter are searched for paraphrases t...
متن کاملMultilingual WSD-like Constraints for Paraphrase Extraction
The use of pivot languages and wordalignment techniques over bilingual corpora has proved an effective approach for extracting paraphrases of words and short phrases. However, inherent ambiguities in the pivot language(s) can lead to inadequate paraphrases. We propose a novel approach that is able to extract paraphrases by pivoting through multiple languages while discriminating word senses in ...
متن کامل